-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29177: Implement default Catalog selection #6088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
a70484b to
5a64e40
Compare
5a64e40 to
06335d9
Compare
b610e0f to
4d48164
Compare
|
| RESOURCE_PLAN_NOT_EXISTS(10418, "Resource plan {0} does not exist", true), | ||
| INCOMPATIBLE_STRUCT(10419, "Incompatible structs.", true), | ||
| OBJECTNAME_CONTAINS_DOT(10420, "Table or database name may not contain dot(.) character", true), | ||
| OBJECTNAME_CONTAINS_DOT(10420, "Catalog or table or database name may not contain dot(.) character", true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use comma: Catalog, database or table names
|
|
||
| private final String catalogName; | ||
|
|
||
| public SwitchCatalogDesc(String databaseName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catalogName ?
| public CreateDatabaseDesc(String catalogName, String databaseName, String comment, String locationUri, String managedLocationUri, | ||
| boolean ifNotExists, Map<String, String> dbProperties) { | ||
| this(databaseName, comment, locationUri, managedLocationUri, ifNotExists, dbProperties, "NATIVE", null, null); | ||
| this(catalogName, databaseName, comment, locationUri, managedLocationUri, ifNotExists, dbProperties, "NATIVE", null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the external catalogs dbtype should be "REMOTE", right? It seems that we always hardcode "NATIVE" and below condition never returns true
if (dbtype != null && dbtype.equalsIgnoreCase("REMOTE")) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the external catalogs dbtype should be "REMOTE", right?
I don't think so. REMOTE dbtyp was introduced by HIVE-24396 data connector. If we implement all tasks about federated catalog, we can deprecate data connector. Because the data connector can only map external databases to internal databases by manually creating remote databases one by one to query external data sources, like this:
CREATE REMOTE DATABASE db_map_mysql USING mysqlconnector with DBPROPERTIES("connector.remoteDbName"="testmysqldb");
whereas the federated catalog can retrieve all external databases at once without the need to manually create remote databases, once we create a catalog.
It seems that we always hardcode "NATIVE" and below condition never returns true
If you create a remote database with a specified dataconnector, this condition will return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we introduce a flag like native/non-native similar to tables?
| throw new SemanticException(ErrorMsg.CATALOG_NOT_EXISTS, catalogName); | ||
| } | ||
| String databaseName = catDbNamePair.getRight(); | ||
| Database database = getDatabase(catDbNamePair.getLeft(), catDbNamePair.getRight(), !ifExists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use local vars?
getDatabase(catalogName, databaseName, !ifExists)
| // Unlock database operation is to release the lock explicitly, the operation itself don't need to be locked. | ||
| // Set the WriteEntity as WriteType: DDL_NO_LOCK here, otherwise it will conflict with Hive's transaction. | ||
| outputs.add(new WriteEntity(getDatabase(databaseName), WriteType.DDL_NO_LOCK)); | ||
| outputs.add(new WriteEntity(getDatabase(catalogName, databaseName, true), WriteType.DDL_NO_LOCK)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIVE_LOCKS table doesn't have catalog column.
Maybe some operation like Unlock/Lock DB, ShowDbLocks should only be supported by NATIVE DBs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIVE_LOCKS table doesn't have catalog column.
Currently, this only retrieves a specific database under a particular catalog and does not involve any lock/unlock operations on the catalog.
However, I believe future development should consider implementing lock/unlock operations for catalogs, though I haven't yet identified the specific use cases. It might be necessary during cross-catalog federated queries.
Maybe some operation like Unlock/Lock DB, ShowDbLocks should only be supported by NATIVE DBs?
To be honest, I haven't fully thought this through. Locking/unlocking databases in external catalogs (such as a MySQL JDBC catalog) doesn't seem very meaningful, as these lock operations won't take effect on the MySQL source side. However, I believe allowing locks on databases in external catalogs in Hive might have some significance, particularly in scenarios involving cross-catalog and cross-database operations, to ensure task transactional consistency. Therefore, let's further explore this issue when we work on cross-catalog queries. An already created ticket, HIVE-29242, can be used to track this matter.
| throw new HiveException("New lock format only supported with db lock manager."); | ||
| } | ||
|
|
||
| // TODO catalog. Need to add catalog into ShowLocksRequest. But ShowLocksRequest doesn't have catalog field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replied in #6088 (comment)
| Map<String, String> props = new HashMap<>(); | ||
| props.put(READONLY, Boolean.TRUE.toString()); | ||
|
|
||
| // TODO catalog. Need to double check the actual catalog here. Depend on HIVE-29278 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think we can replicate not NATIVE DBs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Some task can only do in NATIVE DBs. Like this ReploadTask.
This TODO is added to verify if the correct native catalog name is obtained. BTW, besides the default hive native catalog, we still need to support other multiple native catalogs.
As for external catalogs, we need to skip those tasks that are exclusive to native catalogs. However, the specific types of external catalogs (e.g., JDBC, REST) have not yet been defined, and this part of the work can be refined and completed subsequently. HIVE-29243 can be used to track this matter.
| AlterDatabaseSetPropertiesDesc alterDbDesc = new AlterDatabaseSetPropertiesDesc(dbName, mapProp, | ||
| new ReplicationSpec(replState, replState)); | ||
| // TODO catalog. Need to double check the actual catalog here. Depend on HIVE-29278 | ||
| AlterDatabaseSetPropertiesDesc alterDbDesc = new AlterDatabaseSetPropertiesDesc(HiveUtils.getCurrentCatalogOrDefault(conf), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshal-16 could you please validate if this is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked the details in the main JIRA: https://issues.apache.org/jira/browse/HIVE-28879
As ACID tables can only reside in Hive (default), other catalogs don't make sense in case of replication here
Also, during dump operation itself, it filters only default catalog here:
| new CatalogFilter(MetaStoreUtils.getDefaultCatalog(conf)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for checking, @harshal-16 ! what if we register multiple external HMS catalogs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, then I think we will need to update the incremental dump filter also to make it consistent with the load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIVE-29278 is used to track this matter. I think we can make the rep work with multiple native hms catalog.
|
|
||
| HiveLock lck = lockMgr.lock(new HiveLockObject(dbObj.getName(), lockData), mode, true); | ||
| // Using the catalogName@databaseName format to uniquely identify a database. | ||
| HiveLock lck = lockMgr.lock(new HiveLockObject(catName + "@" +dbObj.getName(), lockData), mode, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, nice hack! however, maybe we should use "." ?
I think it might actually work. i am not sure if show locks would handle this change, might need some tweaks:
show locks db1
show locks cat1.db1
note, might be problematic if CU has multiple HMS versions on a cluster
nit: please add space + dbObj.getName()
| throw new HiveException("Database " + dbName + " does not exist "); | ||
| } | ||
| HiveLockObject obj = new HiveLockObject(dbObj.getName(), null); | ||
| HiveLockObject obj = new HiveLockObject(catName + "@" +dbObj.getName(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space
| * @throws HiveException | ||
| * @throws NoSuchObjectException | ||
| */ | ||
| @Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add to java doc "@deprecated since 4.1.0, will be removed in 5.0.0"
|
|
||
| // TODO: this whole path won't work with catalogs | ||
| /** | ||
| * When you call this method, you need to ensure that the catalog has been set in the db object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not have fallback similar to https://github.com/apache/hive/pull/6088/files#diff-9d6bc2f1a2972d844b3107f29bc1731af7adf6f4c922e08904345c5ae88cd727R1443
| } | ||
| } | ||
|
|
||
| public List<Table> getTableObjects(String catName, String dbName, String pattern, TableType tableType) throws HiveException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dengzhhu653 is it ok or we should use some request object like Database?
| * @return list of table names that match the pattern. | ||
| * @throws HiveException | ||
| */ | ||
| public List<String> getTablesByType(String catName, String dbName, String pattern, TableType type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as https://github.com/apache/hive/pull/6088/files#r2454680968
can we reuse it in getTablesByType(String dbName, String pattern, TableType type) ?
| } | ||
| } | ||
|
|
||
| public List<Function> getFunctionsInDb(String catName, String dbName, String pattern) throws HiveException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
|
|
||
| public static String getCurrentCatalogOrDefault(Configuration conf) { | ||
| return SessionState.get() != null ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could try Optional.ofNullable
| return this.tTable.getCatName(); | ||
| } | ||
|
|
||
| public void setCatalogName(String catalogName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this setter on a Table object?
| return database; | ||
| } | ||
|
|
||
| protected Database getDatabase(String catalogName, String dbName, boolean throwException) throws SemanticException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try reuse code. you could call new method within the old one without catalogName
| Database db = metaData.getDatabase(); | ||
| String destinationDBName = | ||
| context.dbName == null ? db.getName() : context.dbName; | ||
| String destinationCatalogName = db.getCatalogName(); // TODO catalog. Need to double check the catalog here. Depend on HIVE-29278 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @harshal-16
| } else if (command.length > 1 && "show".equalsIgnoreCase(command[0]) && | ||
| "processlist".equalsIgnoreCase(command[1])) { | ||
| return PROCESSLIST; | ||
| } else if(command.length > 1 && "set".equalsIgnoreCase(command[0]) && "catalog".equalsIgnoreCase(command[1])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space
| @@ -0,0 +1,9 @@ | |||
| set hive.lock.numretries=0; | |||
| set hive.support.concurrency=true; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is only needed for ACID tables when DBTxnManager is used
| DESC CATALOG EXTENDED test_cat; | ||
|
|
||
| -- DROP catalog at the end | ||
| DROP CATALOG test_cat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant space
|
thanks @zhangbutao! in general LGTM, added some comments |
|
Sorry for the late reply.
It seems to me this is a significant/big change, and the user should follow a new way of catalog.db.table to request the table as the Trino/Presto does today. I think we can have a more simple idea, let's push the catalog awareness down into the Metastore client. For example, For cross-catalog queries, we can also do it in a similar way, we can introduce a new catalog awareness lawyer between session client and the thrift client, as SessionHiveMetaStoreClient -> CatalogAwarenessMetastoreClient -> ThriftHiveMetaStoreClient. The CatalogAwarenessMetastoreClient can acknowledge of the catalog through configuration or properties stored in HMS(HIVE-27186). Something as By this way, we don't need to introduce the change on the Hive SQL syntax and test it throughly, and other engines(such as Impala) can also benefit from this way to add the support on cross-catalog queries. |
|
The biggest flaw is we cannot have two same db name/table name among catalogs, this might be a question in the future. |



What changes were proposed in this pull request?
This PR first implements the functionality for clients to switch catalogs, such as
SET CATALOG testcat;. Subsequently, created databases or retrieved database lists will all reside under thetestcatcatalog.The primary changes in this PR involve modifying database-related DDL interfaces to ensure that operations like
CREATE/ALTER/DROP DATABASErespect the client's current catalog.You can check the q file catalog_database.q to see some syntax about cat.db. Such as:
This PR added some
TODO cataloglabels, as some catalog tasks need to be fixed in other tickets. You can searchTODO catalogto find which task should be done later.Additionally, this PR only addresses the interfaces for switching catalogs and databases, and does not cover interfaces related to
cat.db.table. Therefore, even if the catalog and database are switched, the current client will still query tables from the defaulthivecatalog. The code related tocat.db.tbl(part of the work in HIVE-29177) will also involve many table-related interfaces, which presents certain refactoring challenges.Why are the changes needed?
We need to make database work with catalog to support multiple catalogs in client simultaneously.
Does this PR introduce any user-facing change?
No. Users still can use classical syntax such as
use dbname.How was this patch tested?
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile='catalog_database.q' -Dtest.output.overwrite=true -pl itests/qtest -Pitests